Systems and Methods For Decoding Payer Identification In Health Care Data Records

ABSTRACT

Processing arrangements and methods are provided for the automated decoding or translation of information in healthcare data records, which are coded in a non-standardized or varying formats. A data record which contains information, a portion which is recognized and another portion of which is new, is decoded or translated using a statistical mapping rule. The mapping rule assigns a most likely translation value to the information based on the recognized portion of the information. The statistical mapping rules are established by analysis of a set of previously decoded data records.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.12/782,423, filed May 18, 2010, which is a continuation of U.S. patentapplication Ser. No. 10/893,838, filed Jul. 19, 2004, which claims thebenefit of U.S. provisional patent application Ser. No. 60/488,692,filed Jul. 18, 2003.

BACKGROUND OF THE INVENTION

The present invention relates to techniques for analysis of health careand pharmaceutical data. The invention in particular relates to thecorrelation of specific managed health care payers and plans withprescription data records that contain non-standardized health carepayer and plan identifiers.

Prescription data records that are generated by retail pharmacies orhospital dispensaries, for example, when they fill prescriptions forcustomers, contain labels or data fields that include informationidentifying the party responsible for authorizing and/or making paymentsfor the prescriptions. Useful market intelligence may be derived fromstatistical or other analysis of the responsible party information andother information in the prescription data records. The useful marketintelligence may, for example, include competitive assessments of themarketing and sales of a particular product, which may be of interest toa pharmaceutical concern, or health care provider or agency.

The prescription data records may include information, which relates tothe party responsible for authorizing and/or making payments, in onemore data fields such as Bank Identification Numbers (“BINs”), ProcessorControl Numbers (“PCNs”), and health care plan Group IdentificationNumbers (“Group IDs”). The BIN data field may for example, contain asix-digit number that codes information about the adjudicator of theprescription drug claim or script.

Unfortunately, the type and number of such data fields may vary witheach generator or source of the prescription data records. Theprescription data records formats also may change in time. Further, theinformation in the data records is often coded in a non-standardizedformat. The labels and other coded information in the prescription datarecords must be decoded before full analysis of the data records cantake place. In practice, a market research organization or other partyanalyzing the prescription data records may undertake to build aglossary or dictionary of the labels or codes that are found in datafields such BIN, PCN or Group ID.

The market research organization may manually verify the codes enteredin the glossary or dictionary. On encountering a new label or code in aprescription data record, the market research organization may, forexample, make manual inquires (e.g., via telephone calls) to individualretail pharmacy organizations or pharmacy benefit management companies(“PBMs”) in order to verify the meaning of the code. Such manualverification procedures can be both laborious and expensive.Furthermore, the manual verification procedures may not be alwayssuccessful or complete. The success of the manual verificationprocedures depends on the responsiveness of the third parties, who maynot be obligated to respond.

Consideration is now being given to ways of enhancing procedures fordecoding information contained in prescription data records. Attentionis directed to procedures for verifying the meaning of codes and labelsin prescription data records that relate, for example, to the identityof parties responsible for authorizing and/or making payments. Thedesirable procedures may be automated, thereby minimizing the need tocontact other parties for code verification.

SUMMARY OF THE INVENTION

In accordance with the present invention, data processing arrangementsand automated procedures are provided for translating the varying codesand labels that are used in prescription data records to identify ormark involved parties.

The inventive data processing arrangement develops translation ormapping rules based on a set of previously mapped data records. Apreviously mapped data record in the set may have been assigned atranslation value or “ID” based on the content or values of a pluralityof data fields (e.g., three data fields) in the data record. The dataprocessing arrangement is configured to first identify uniquecombinations of the values of a lesser number of the data fields (e.g.,two data fields) occurring in the set of previously mapped data records.Then for each unique combination of the values of the lesser number ofdata fields, the data processing arrangement determines the statisticalfrequency of ID assignments in the set of previously mapped datarecords. Mapping rules, which assign a frequently occurring ID to otherdata records based on the content of the lesser number of data fields(e.g., two data fields), are then established. Thus, even when thevalues of a data field (e.g., the third data field) in a data record arenot recognized or are new, the mapping rules allow assignment of an IDto the data record based on recognized combinations of the values of thelesser number of data fields. The mapping rules may be validated orverified by data suppliers, and assembled for use in a look-up table

In a preferred embodiment of the invention, the frequently occurring IDassigned by the mapping rules is the most frequently occurring or mostlikely ID found in the previously mapped set of data records.

Further features of the invention, its nature, and various advantageswill be more apparent from the following detailed description and theaccompanying drawings,

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1 a and 1 b are flow diagrams, which illustrate several of thesteps in an exemplary procedure for verifying payer identification codesin prescription data records, in accordance with the principles of thepresent invention.

FIG. 2 is a schematic representation of the format of a prescriptiontransaction data record.

Throughout the figures, unless otherwise stated, the same referencenumerals and characters are used to denote like features, elements,components, or portions of the illustrated embodiments.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides solutions for translating or decodinginformation in health care data records, which is coded in anon-standardized or varying formats. The solutions may be implemented inconventional computer data processing arrangements so that electronichealth care data records can be processed automatically. The solutionsutilize continuous learning algorithms to decode new information, whichat least partially overlaps with the previously mapped or decodedinformation. The learning algorithms exploit previously validatedmappings of information as bridges to decode and establish mapping rulesfor the partially overlapping new information. The mapping rules for thenew information may be validated or verified on a statistical basis. Thevalidated mapping rules are assembled in a bridge table or referencefile for convenience in use.

For purposes of describing applications of the invention, prescriptiontransaction data records are used as the exemplary health care datarecords, herein. FIG. 2 shows the format of an exemplary prescriptiontransaction record 200, which may be prepared by a retail pharmacy inthe course of filling a prescription request by a customer. Theprescription transaction data record may be part of an electronic datafile that is assembled by a data supplier/vendor, and made available foranalysis (e.g., to market researchers). Prescription transaction datarecord 200 may include one or more data fields (e.g., BIN, PCN or GroupID) that include coded information on the parties (e.g., banks,processors, health insurance plans) that are involved in authorizing ormaking payments for the prescription.

Data record 200 also may include an alternate or additional fourth field(“DS”), which includes data supplier specified codes for the payer (“DSpayer codes”). The DS payer codes in a data record might be payeridentification codes that are assigned to payers by independent choiceor practice of the data supplier or data record preparer. Differentsuppliers and data record preparers may choose varying formats for theDS payer codes. Other data fields in data record 200 also may includeother supplier/preparer specified indicia in varying formats.

FIGS. 1 a and 1 b show the steps of an exemplary learning procedure 100for mapping non-standardized information in health care records.Procedure 100 may be used so that the non-standardized information maybe used to categorically identify or associate specific third parties(e.g., specific managed health care payers and plans) with the specifichealth care records. Procedure 100 may be advantageously used by amarket researcher or analyst (“MR”), for example, to relate specificmanaged health care payers and plans (“payer”) to specific prescriptiontransaction data records. As a preliminary step, the MR may establish acomprehensive identification list of payers. Each payer in the list maybe uniquely associated with a unique MR code (“MR payer code”). The MRpayer code may, for example, be a six-character alpha-numeral.

At step 110, MR acquires a file of prescription transactions datarecords (“Rx file”) for analysis. The prescription transaction recordsin the Rx file may, for example, include information on retailprescription transactions conducted in any suitable market region orsegment of interest. Further, the prescription transactions data recordsin the Rx file may span any suitable time interval of interest (e.g., aweek, month or quarter).

The data records in the Rx file may be acquired directly from retailers(e.g., independent pharmacies or drug store chains) by the MR itself oracquired using the services of intermediary commercial data vendors orsuppliers. The prescription transaction data records may include one ormore data fields (e.g., BIN, PCN, Group ID, DS) that include codedinformation on the prescription payers. (See e.g., FIG. 2). The DS datafield may include data supplier/preparer specified codes for the payer(“DS payer codes”) in varying formats. Additional data fields mayinclude other supplier specified indicia. The data supplier may in someinstances make available to the MR a glossary of the DS payer codes. Theglossary may, for example, indicate that a code “xxyy” refers to “XYZ”health insurance company.

The data records in the Rx file or similar data records may previouslyhave been processed using procedures other than procedure 100 (e.g.,manual verification procedures or data supplier glossaries) to decodethe identity of parties responsible for authorizing and/or makingpayments for at least some of the data records.

At step 120 of procedure 100, the prescription transaction data recordsin the Rx file are processed to retain only those transactions thatinvolve a third party payer (i.e., a health care plan). At this step,all cash transactions, which, for example, are fully paid for by theretail customers, are excluded from the Rx file under analysis.Prescription transactions that are covered by Medicaid or othergovernment initiatives also may be excluded from the Rx file. Thus, only“non-cash” transactions that include information about a third partypayer are retained in the Rx file, The percentage of non-cashtransactions in a typical Rx file under present pharmaceutical marketconditions may be about 75%.

Next at step 130, a “populated” set of prescription transaction recordsin which both of the BIN and Group ID data fields are populated isidentified. Optionally in some implementations of process 100, datatransaction records, which alternately or additionally have the PCN datafields populated also may be included in the “populated” set of datatransaction records. Similarly, the populated set of data transactionrecords may further include data transactions in which the DS field ispopulated with data supplier specified payer codes. At associated step130 a, the populated set of data records may be analyzed to identify thenumber of and to update a list of all unique combinations of BIN andGroup ID values found in the data records. In implementations where PCNdata fields are also considered, the identification of uniquecombinations of BIN and Group ID values may be extended to include PCNvalues as appropriate.

At step 140 of procedure 100, the populated set of prescriptiontransaction records is separated into two subsets A and B. The firstsubset A includes prescription transaction records whose BIN and GroupID combinations have been previously mapped to corresponding payers(i.e. MR payer codes). These prescription transaction records may havebeen previously mapped using, for example, recognizable DS codes andsupplier glossaries. Conversely, the second subset B includesprescription transaction data records whose DS codes or other supplierindicia have not been previously mapped to or associated with specificpayers. In other words the DS codes or other supplier indicia values arenew or not recognized.

At optional step 150, the data records in subset A may be encoded withthe mapped MR payer ID corresponding to the combinations of BIN andGroup ID values in the records. Similarly, the data records in subset Bmay be encoded with the dummy MR payer ID values (e.g., all 7s, 8s or9s) to indicate that the DS codes or other supplier indicia in therecords are new and have not yet been mapped to MR payer ID values.

In learning procedure 100, the subset A data records and associatedmapping information are advantageously used to learn mapping rules thatcan be applied to the new data field combinations (e.g., those found insubset B data records).

The distribution of the number of data records in each subset A and B bydata field value may then be computed. First, for example, at step 160the mapped prescription data records in subset A may be sorted by, orbinned or grouped for each unique combination of data field values(e.g., combinations of BIN and Group ID values, or combinations of BIN,Group ID and PCN values) present in subset A. Similarly, the unmappedprescription transaction data records in subset B may be sorted, binnedor grouped for each unique combination of BIN and Group ID valuespresent in subset B.

As part of the learning process, statistical analysis (e.g., frequencydistribution analysis of data field values in subset A) is conducted. Atstep 170, the number of prescription transaction data records for eachunique combination of data field values is counted. This step yields,for example, a frequency distribution of the unique BIN and Group IDcombinations across the prescription data records in subset A (step 170a).

Using the frequency distribution results of step 170, data field valuecombinations that have only a few associated prescription transactiondata records may be dropped from further consideration in the learningprocess. For example, data field value combinations that have frequencycounts that are less than a suitable cutoff number X, may be droppedfrom further consideration. The suitable cutoff number X may be selectedon the basis of statistical theories of sample size, or empirically bytrial and error use of procedure 100. Only data record groupscorresponding to data field combinations that have an occurrencefrequency greater than the cutoff, limit X may be considered to havesample sizes of sufficient statistical significance.

At step 180 and 180 a, these remaining data field combinations in subsetA are analyzed to determine the frequency distribution of associatedprescription data records by MR payer ID to which they (data records)are mapped. For each unique data field value combination (e.g., BIN andGroup ID combination) and each mapped MR payer ID, the percentage ofprescription data records mapped to the MR payer ID is computed.

As in the case of the distribution of data field combinations (steps 170and 170 a), MR payer IDs that have only a few associated prescriptiontransaction data records may be dropped from further consideration inthe learning process. For example, at step 190 prescription data recordsthat are associated with MR payer IDs that occur with a frequency ofless than cutoff limit Y % may be dropped from consideration. Like thecutoff number X, the cutoff limit Y % may be selected on the basis ofstatistical theories on sample size or empirically by trial and error.

Further at step 190, each combination of unique data field values (e.g.,BIN, Group ID and/or PCN) and its mapped MR payer ID, which has anoccurrence frequency greater than Y %, is established as a “mappingbridge” These mapping bridges represent the mapping information learntfrom subset A. Such learnt mapping bridges may be applied to datarecords with unrecognized DS codes and other supplier indicia to map orassociate these data records to MR payer IDs. The learnt mapping bridgesmay, for example, be assembled or listed in a bridge table so that theycan utilized in the manner of a lookup table in automated dataprocessing arrangements. The mapping bridges may first be validated atoptional step 200, for example, by manually verifying the correlation ofthe specific combinations of BIN, Group ID and/or PCN and specific MRpayer IDs with selected data suppliers.

Steps 210-240 shown in FIG. 1 b, are exemplary steps that show theapplication of the learnt mapping bridges to address the problem ofmapping the prescription transaction data records in subset B (or otherdata files) that have unrecognized DS or other supplier indicia. At step210, a determination is made of which BIN and Group ID valuecombinations in subset B have corresponding MR payer ID entries in themapping bridge table. All the prescription transaction data recordsassociated or binned with a specific BIN/Group ID (e.g., step 160) maythen be assigned the corresponding MR payer ID entry shown in themapping bridge table.

At step 220 and 220 a, the remaining BIN and Group ID valuecombinations, which do not have entries in the mapping bridge table, maybe further to extract additional information. The number of prescriptiontransaction data records associated with each of the remaining BIN andGroup ID value combinations may be counted. For the each of remainingcombinations associated with a large number of prescription transactiondata records, the distribution of data records by data supplier may becomputed. At step 230, selected data suppliers may, for example, berequested to assist in decoding or resolving DS codes and other supplierindicia so that these can be correlated to the MR payer IDs. At step240, newly resolved BIN/Group ID and MR payer ID combinations may beadded to the mapping bridge table for future use.

It will be understood that the particular sequence of steps in learningprocedure 100 described above is exemplary. The steps may be performedin any suitable sequence and particular steps may be omitted ormodified. Alternate or additional steps may be added to procedure 100 assuitable or appropriate, for example, for modifications of procedure 100which employ alternative statistical or matching models. Further, thesteps of procedure 100 may be implemented in data processingarrangements using any suitable combination of computer hardwareelements and software applications.

The foregoing merely illustrates the principles of the invention.Various modifications and alterations to the described embodiments willbe apparent to those skilled in the art in view of the teachings herein.It will thus be appreciated that those skilled in the art will be ableto devise numerous techniques which, although not explicitly describedherein, embody the principles of the invention and are thus within thespirit and scope of the invention.

1. A method for establishing mapping rules for assigning codes toprescription transaction data records, wherein each data recordcomprises at least a first, a second and a third data field, the methodcomprising: receiving a set of previously mapped data records in whichcodes have been assigned to the data records based on at least thevalues of the third data fields; determining a unique combination offirst and second data field values present in the set of previouslymapped data records; determining a code that is assigned to the datarecords having the unique combination of first and second data fieldvalues with an acceptable frequency; and storing a mapping rule thatmaps the code to the unique combination of first and second data fieldvalues, wherein the mapping rule is stored only for the uniquecombinations of the first and second data field values that occur with afrequency greater than a cutoff limit.
 2. The method of claim 1, whereineach data record comprises at least a first, a second, third, and fourthdata field, the method further comprising: determining a uniquecombination of first, second and fourth data field values present in theset of previously mapped data records; determining a code that isassigned to the data records having the unique combination of first,second, and fourth data field values with an acceptable frequency; andstoring a mapping rule that maps the code to the unique combination offirst, second, and fourth data field values, wherein the mapping rule isstored only for the unique combinations of the first, second, and fourthdata field values that occur with a frequency greater than a cutofflimit.
 3. A method for assigning codes to prescription transaction datarecords, comprising: receiving a first and a second subset of datarecords of the prescription transaction data records, wherein theprescription transaction records include at least a first, second, andthird data field, wherein data records in the first subset have beenpreviously assigned codes based on known mappings of the values in theirrespective third data fields, and wherein data records in the secondsubset have not been previously assigned codes; for each data record inthe first subset, identifying data records in which at least one of thefirst and second data fields are populated with data values; determiningunique combinations of data values of the first and second data fieldspresent in the first subset; determining the occurrence frequency ofeach unique combination of the data values of the first and second datafields in the first subset; identifying the unique combinations of firstand second data values that occur with a frequency greater than a cutofflimit X; for each of the identified unique combination of first andsecond data values, determining the frequency distribution of codesassigned to data records in the first subset having the correspondingunique combination; identifying codes in the frequency distribution foreach unique combination having an occurrence frequency greater than acutoff limit Y; and for each identified code, storing the combination ofthe first and second data values and the identified code as a mappingrule for mapping data records in the second subset of data records. 4.The method of claim 3, further comprising: assigning codes to datarecords in the second subset based at least in part on matching thefirst and second data values of the data records in the second subsetwith one of the unique combinations of data values and applying acorresponding mapping rule.
 5. The method of claim 3, wherein the firstsubset includes historical data records.